LESO: A ten-year ensemble of satellite-derived intercontinental hourly surface ozone concentrations

This study presents a novel ensemble of surface ozone (O3) generated by the LEarning Surface Ozone (LESO) framework. The aim of this study is to investigate the spatial and temporal variation of surface O3. The LESO ensemble provides unique and accurate hourly (daily/monthly/yearly as needed) O3 surface concentrations on a fine spatial resolution of 0.1◦ × 0.1◦ across China, Europe, and the United States over a period of 10 years (2012–2021). The LESO ensemble was generated by establishing the relationship between surface O3 and satellite-derived O3 total columns together with high-resolution meteorological reanalysis data. This breakthrough overcomes the challenge of retrieving O3 in the lower atmosphere from satellite signals. A comprehensive validation indicated that the LESO datasets explained approximately 80% of the hourly variability of O3, with a root mean squared error of 19.63 μg/m3. The datasets convincingly captured the diurnal cycles, weekend effects, seasonality, and interannual variability, which can be valuable for research and applications related to atmospheric and climate sciences.


Results
In relation to spatial distribution in Figure S1, the resolution of the 4 • × 5 • for the GEOS-Chem model was coarse.Consistent with previous studies 4,6 , this model showed clear tendencies of overestimation.Meanwhile, the EAC4 model yielded significantly lower estimates than LESO; however, it exhibited a noticeable level of spatial heterogeneity.In the case of CMAQ, its spatial features closely resembled those of LESO, yet it captured significantly more detailed spatial intricacies compared to LESO.The spatial distribution was generally consistent between these LESO, CMAQ, and EAC4.
Evidenced by the slope of the linear regression analysis Figure S2, it becomes evident that the modeled surface O 3 concentrations from the EAC4 exhibit a notable decrease compared to those from the LESO model, particularly within the geographical context of China.i This discrepancy suggests a potential tendency towards underestimation when considering the LESO results as the benchmark.Specifically, in Europe and the US, the EAC4-modeled O3 concentrations amounted to 56 % of those predicted by the LESO model, whereas in China, this value was reduced to 42 %.Conversely, the level of agreement between the LESO and CMAQ models was conspicuous, devoid of significant instances of either overestimation or underestimation.
In the context of evaluating model O 3 estimates against site measurements, the assessment reveals distinct patterns in Figure S3.The median coefficient of determination (median R 2 ) showed significant variations across models and regions, especially when comparing sites within the same geographic region.For instance, in both China and Europe, the median R 2 for the EAC4 model was approximately 0.6, indicating a moderate level of agreement between the model estimates and measurements.In contrast, this value dipped below 0.4 in the US, indicating a relatively weaker agreement between EAC4 estimates and observed values.The median values of the root mean squared error (RMSE) in these three regions were approximately 30 µg/m 3 , 20 µg/m 3 , and 20 µg/m 3 , respectively.
Conversely, the CMAQ model demonstrated better performance in the US.The median R 2 value associated with CMAQ was around 0.45, which exceeded that of EAC4.This indicates a relatively stronger correlation between the CMAQ estimates and actual measurements.Moreover, the median RMSE for CMAQ remained below 15 µg/m 3 , which was notably smaller than the corresponding value for EAC4.It is worth highlighting that the LESO model exhibited exceptional proficiency when compared to in-situ measurements.
As indicated by the site-level validations (Figure S4), there was minimal variance in the disparity of R 2 values between the utilization of satellite columns and their absence.The interquartile range measured less than 0.05.Correspondingly, the difference in RMSE was insignificant; both the median RMSE and the interquartile range of RMSE approached negligible values of approximately zero.In line with our previous studies 7,8 , the inclusion of satellite-derived data had a limited effect on the accuracy of surface O 3 estimation using machine learning algorithms.This was accomplished by extrapolating measurements from specific monitoring sites.

Discussion
In the broader context, a certain level of coherence can be observed among machine learning, reanalysis, and chemical transport models in general.However, significant disparities become evident when examining their effectiveness in capturing spatial variations of surface O 3 , especially when compared to in-situ measurements.Employing LESO as the reference standard, the GEOS-Chem model used in this study shows significant overestimations, which aligns with previous research findings.Conversely, outcomes from EAC4 indicate significant underestimations.The CMAQ model, in contrast, aligns more closely with the LESO ensemble.It provides a more intricate depiction, although it still has limitations in accurately capturing variations in accurately capturing variations of O 3 variations when compared to site-level measurements.
This comparison further underscores the inherent limitations of the LESO model.While it agrees with site-level measurements, its representation of spatial variability is still lacking.This shortcoming may be attributed to the exclusive reliance of the LESO model on satellite and meteorological data (Figure S4).Currently, we have included the satellite measurements as model inputs temporarily, with the intention of replacing it with more reliable information in future updates.Enhancing the semi-empirical nature of the LESO model could involve incorporating practical constraints, such as the use of inventory data similar to those employed in CTMs, thereby improving its realism and applicability.

Figure S4 .
Figure S4.Differences in the accuracy of surface O 3 estimation with and without satellite columns, represented by the determination coefficient (R 2 ) (a) and the root mean squared error (RMSE) (b).In these boxplots, the central horizontal line represent the mean concentration, while the triangles symbolize the median value.